Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification
نویسندگان
چکیده
In this paper, we address the problem of knowing when to stop the process of active learning. We propose a new statistical learning approach, called minimum expected error strategy, to defining a stopping criterion through estimation of the classifier’s expected error on future unlabeled examples in the active learning process. In experiments on active learning for word sense disambiguation and text classification tasks, experimental results show that the new proposed stopping criterion can reduce approximately 50% human labeling costs in word sense disambiguation with degradation of 0.5% average accuracy, and approximately 90% costs in text classification with degradation of 2% average accuracy.
منابع مشابه
Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
In this paper, we analyze the effect of resampling techniques, including undersampling and over-sampling used in active learning for word sense disambiguation (WSD). Experimental results show that under-sampling causes negative effects on active learning, but over-sampling is a relatively good choice. To alleviate the withinclass imbalance problem of over-sampling, we propose a bootstrap-based ...
متن کاملClinical Word Sense Disambiguation with Interactive Search and Classification
Resolving word ambiguity in clinical text is critical for many natural language processing applications. Effective word sense disambiguation (WSD) systems rely on training a machine learning based classifier with abundant clinical text that is accurately annotated, the creation of which can be costly and time-consuming. We describe a double-loop interactive machine learning process, named ReQ-R...
متن کاملA literature survey of active machine learning in the context of natural language processing
Active learning is a supervised machine learning technique in which the learner is in control of the data used for learning. That control is utilized by the learner to ask an oracle, typically a human with extensive knowledge of the domain at hand, about the classes of the instances for which the model learned so far makes unreliable predictions. The active learning process takes as input a set...
متن کاملThe learning vector quantization algorithm applied to automatic text classification tasks
Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categori...
متن کاملPartially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora
Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed...
متن کامل